Differentially Private Distance Learning in Categorical Data
نویسندگان
چکیده
Abstract Most privacy-preserving machine learning methods are designed around continuous or numeric data, but categorical attributes common in many application scenarios, including clinical and health records, census survey data. Distance-based methods, particular, have limited applicability to since they do not capture the complexity of relationships among different values a attribute. Although distance algorithms exist for may disclose private information about individual records if applied secret dataset. To address this problem, we introduce differentially family distances between any pair attribute according way co-distributed with other forming so-called context. We define variants our algorithm show empirically that approach consumes little privacy budget while providing accurate distances, making it suitable distance-based applications, such as clustering classification.
منابع مشابه
Differentially Private Response Mechanisms on Categorical Data
We study mechanisms for differential privacy on finite datasets. By deriving sufficient sets for differential privacy we obtain necessary and sufficient conditions for differential privacy, a tight lower bound on the maximal expected error of a discrete mechanism and a characterisation of the optimal mechanism which minimises the maximal expected error within the class of mechanisms considered.
متن کاملDifferentially private Bayesian learning on distributed data
Many applications of machine learning, for example in health care, would benefit from methods that can guarantee privacy of data subjects. Differential privacy (DP) has become established as a standard for protecting learning results, but the proposed algorithms require a single trusted party to have access to the entire data, which is a clear weakness. We consider DP Bayesian learning in a dis...
متن کاملDifferentially Private Online Learning
In this paper, we consider the problem of preserving privacy in the online learning setting. Online learning involves learning from the data in real-time, so that the learned model as well as its outputs are also continuously changing. This makes preserving privacy of each data point significantly more challenging as its effect on the learned model can be easily tracked by changes in the subseq...
متن کاملDifferentially Private Local Electricity Markets
Privacy-preserving electricity markets have a key role in steering customers towards participation in local electricity markets by guarantying to protect their sensitive information. Moreover, these markets make it possible to statically release and share the market outputs for social good. This paper aims to design a market for local energy communities by implementing Differential Privacy (DP)...
متن کاملContext-Based Distance Learning for Categorical Data Clustering
Clustering data described by categorical attributes is a challenging task in data mining applications. Unlike numerical attributes, it is difficult to define a distance between pairs of values of the same categorical attribute, since they are not ordered. In this paper, we propose a method to learn a context-based distance for categorical attributes. The key intuition of this work is that the d...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Data Mining and Knowledge Discovery
سال: 2021
ISSN: ['1573-756X', '1384-5810']
DOI: https://doi.org/10.1007/s10618-021-00778-0